评估当前序列或对话级Chatbots(例如Impathetic Open-Domain对话模型)的一个挑战是确定Chatbot是否以情绪一致的方式执行。最近的工作仅在对话之间的语境一致性,语言流畅性,响应多样性或逻辑自我一致性的方面进行评估。这项工作建议培训评估员以确定聊天禁令的情绪一致性。
translated by 谷歌翻译
Graph neural networks (GNNs), as the de-facto model class for representation learning on graphs, are built upon the multi-layer perceptrons (MLP) architecture with additional message passing layers to allow features to flow across nodes. While conventional wisdom largely attributes the success of GNNs to their advanced expressivity for learning desired functions on nodes' ego-graphs, we conjecture that this is \emph{not} the main cause of GNNs' superiority in node prediction tasks. This paper pinpoints the major source of GNNs' performance gain to their intrinsic generalization capabilities, by introducing an intermediate model class dubbed as P(ropagational)MLP, which is identical to standard MLP in training, and then adopt GNN's architecture in testing. Intriguingly, we observe that PMLPs consistently perform on par with (or even exceed) their GNN counterparts across ten benchmarks and different experimental settings, despite the fact that PMLPs share the same (trained) weights with poorly-performed MLP. This critical finding opens a door to a brand new perspective for understanding the power of GNNs, and allow bridging GNNs and MLPs for dissecting their generalization behaviors. As an initial step to analyze PMLP, we show its essential difference with MLP at infinite-width limit lies in the NTK feature map in the post-training stage. Moreover, though MLP and PMLP cannot extrapolate non-linear functions for extreme OOD data, PMLP has more freedom to generalize near the training support.
translated by 谷歌翻译
PCL检测任务旨在识别和分类语言,这些语言是光顾或屈服于一般媒体中的脆弱社区。 ,使通用文本分类方法的表现令人失望。针对Semeval-2022任务4中的PCL检测问题,在本文中,我们对团队的解决方案进行了介绍,该解决方案利用了基于段落分类的及时学习的力量。我们将任务重新制定为适当的披肩提示,并使用预先训练的蒙版语言模型来填补披肩插槽。对于这两个子任务,即二进制分类和多标签分类,采用并微调Deberta模型来预测特定于任务的提示的标签单词。在评估数据集中,对于二进制分类,我们的方法达到了0.6406的F1分数;对于多标签分类,我们的方法达到了0.4689的宏F1得分,在排行榜中排名第一。
translated by 谷歌翻译
使用无法回答的问题的机器阅读理解是一项艰巨的NLP任务,受到无法从段落回答的问题的挑战。据观察,微妙的文字变化通常使一个可回答的问题无法回答,但是,大多数MRC模型无法识别此类变化。为了解决这个问题,在本文中,我们提出了一种基于跨度的对比度学习方法(SPANCL),该方法在答案跨度上明确将可回答的问题与他们的回答和无法回答的对应物进行了明确的对比。使用SPANCL,MRC模型被迫从微小的字面差异中感知至关重要的语义变化。小队2.0数据集的实验表明,SPANCL可以显着改善基准,从而产生0.86-2.14绝对EM的改进。其他实验还表明,Spancl是利用生成问题的有效方法。
translated by 谷歌翻译
基于模型的离线优化通过动态感知政策为策略学习和分布外概括提供了新的观点,在该策略中,学会的政策可以适应培训阶段列举的不同动态。但是,由于离线设置下的限制,学到的模型无法很好地模仿真实的动态,以支持可靠的分发勘探,这仍然阻碍了政策以良好的概括。为了缩小差距,先前的作品大致集成了随机初始化的模型,以更好地近似实际动力学。但是,这种做法是昂贵且效率低下的,并且无法保证学识渊博的模型可以近似真正的动态,我们在本文中命名了覆盖性。我们通过生成具有可证明的能力以有效且可控制的方式覆盖真实动态的模型来积极解决这个问题。为此,我们根据动力学下的策略占用,为动态模型设计一个距离度量,并提出了一种算法来生成模型,以优化其对真实动力学的覆盖范围。我们对模型生成过程进行了理论分析,并证明我们的算法可以提供增强的覆盖性。作为一项下游任务,我们以较小或没有保守的惩罚训练动态感知政策,实验表明我们的算法在现有的离线RL基准测试中优于先前的离线方法。我们还发现,通过我们的方法学到的政策具有更好的零转移性能,这意味着它们的概括更好。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译
Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.
translated by 谷歌翻译